Inference from small and big data sets with error rates
نویسندگان
چکیده
منابع مشابه
On Small Data Sets Revealing Big Differences
We use decision trees and genetic algorithms to analyze the academic performance of students throughout an academic year at a distance learning university. Based on the accuracy of the generated rules, and on crossexaminations of various groups of the same student population, we surprisingly observe that students’ performance is clustered around tutors.
متن کاملDrawing causal inference from Big Data.
Human society has found the means to collect and store vast amounts of information about every subject imaginable, and is archiving this information in attempts to use it for scientific, utilitarian (e.g., health), and business purposes. These large databases are colloquially termed Big Data. How big BigData are is of course a matter of perspective, and can range, for example, from the “tiny” a...
متن کاملBig Data Quality: From Content to Context
Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data...
متن کاملConfigurations in sets big and small
When does a given set contain a copy of your favorite pattern (for example, specially arranged points on a line or spiral, or the vertices of a polyhedron)? Does the answer depend on how thin the set is in some quantifiable sense? Problems involving identification of prescribed configurations under varying interpretations of size have been vigorously pursued both in the discrete and continuous ...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronic Journal of Statistics
سال: 2015
ISSN: 1935-7524
DOI: 10.1214/15-ejs1011